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Abstract 

We present sixth- and eighth-order Hermite integrators for astrophysical iV-body 
simulations, which use the derivatives of accelerations up to second order {snap) 
and third order {crackle). These schemes do not require previous values for the cor- 
rector, and require only one previous value to construct the predictor. Thus, they 
are fairly easy to implement. The additional cost of the calculation of the higher 
order derivatives is not very high. Even for the eighth-order scheme, the number of 
floating-point operations for force calculation is only about two times larger than 
that for traditional fourth-order Hermite scheme. The sixth order scheme is better 
than the traditional fourth order scheme for most cases. When the required accuracy 
is very high, the eighth-order one is the best. These high-order schemes have several 
practical advantages. For example, they allow a larger number of particles to be in- 
tegrated in parallel than the fourth-order scheme does, resulting in higher execution 
efficiency in both general-purpose parallel computers and GRAPE systems. 

Key words: Methods: iV-body simulations, Stellar dynamics 
PACS: 95.10.ce, 98.10,+z 



1 Introduction 



Aarsethl (119631 ) introduced what is now called the "Aarseth scheme" or "the 
standard scheme" for the direct integration of gravitational iV-body systems. It 
is a combination of the individual timestep algorithm, which allows individual 
particles to have their own times and timesteps, and variable-stepsize fourth- 
order Adams-Moulton predictor-corrector scheme. 

The basic idea of individual timestep scheme is as follows. When particle % is 
integrated from its time U to its new time U + At,, the calculation of acceler- 
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ation is done only at its new time, and positions of all other particles at that 
time are "predicted" in some way. Thus, Adams-Moulton predictor-corrector 
schemes in the PEC (predict-evaluate-correct) mode with variable stepsize are 
suitable for the individual timestep algorithm, because they require the accel- 
eration calculation only at the end of the timestep. In addition, in the PEC 
mode, the acceleration can be calculated from predicted variables. 



If we do not use the individual timestep algorithm, we can easily change 
timesteps if we use single-step integration schemes such as Runge-Kutta meth- 
ods. However, Runge-Kutta schemes cannot be combined with the individual 
timestep algorithm, because they require the calculation of accelerations in 
intermediate points. In the case of two particles with different timesteps, in 
order to integrate the particle with longer timestep, we need the position of the 
other particle in the past. However, with usual implementation of the individ- 
ual timestep algorithm, such past data is not available. In principle, we coul d 
keep the past trajectory of particles as demonstrated by lMakino et al.l (120061 ). 
Such schemes are no t yet w idely used, though a sample implementation does 
exist flHut & Makinol . l2007h . 



The fourth-order Aarseth scheme had been the method of choice for the time 
integration of gravitational iV-body systems. However, the optimal value for 
the order of the integration scheme has not been known. iMakinol (1l991aT ) 
implemented the Aarseth scheme with an arbitrary order, and performed a 
systematic test of the accuracy. He found that the optimal choice of the order 
weekly depends on the required accuracy, and if the required accuracy is very 
high orders higher than 4 would give better results. However, he also found 
that the fourth-order scheme is close to optimal for practical values of required 
accuracy. His result, however, is for a pure individual timestep algorithm, for 
which the calculation cost of the acceleration depends on the order of the 
integrator, thro ugh the calculatio n cost of p redictors f or part icles other than 



that integrated. iMcMillan I (119861 ) and later iMakino I (Il991bl ) introduced the 
so-called blockstep scheme, in which the timesteps of particles are quantized 
to powers of two so that multiple particles share exactly the same time. With 
this blockstep scheme, the calculation cost of predictors becomes much smaller 
than that of the force calculation for any practical value of the order of the 
integration scheme, and therefore high-order schemes become more efficient 
than in the case of the original individual timestep algorithm. 



Makinol (Il991al ) also introduced the concept of Hermite scheme, in which the 
first time derivative of the acceleration is directly calculated and used to con- 
struct the interpolation (actually the extrapolatio n for the predictor) poly- 



nomial. As discussed in IMakino fc Aarseth I (119921 ). the fourth-order Hermite 



scheme has an extra advantage that it is quite simple to implement. The 
predictor polynomial for fourth-order schemes must be at least third order for 
position and second order for velocity. Fortunately, the directly calculated first 
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time derivative of the acceleration (Jerk) is just sufficient to construct predic- 
tors. In other words, fourth-order Hermite scheme is effectively a single-step 
algorithm which does not require the memory of previous timesteps. 



Another advantage of the fourth-order Hermite scheme is that it is time- 
symmetric, when used with the correct-to-convergence mode. This feature has 
been used to achiev e effective time-sym metry for the integration of internal 
motio ns of binaries (Funato et al. . 1997T) or nearly-circular orbits of planetesi- 
mals ( Kokubo et al. . 19981 ). Also, in ( Hut fc Makino . 2007 ). a time- symmetric 
individual timestep algorithm with Hermite scheme have been implemented. 



The calculation cost of the Hermite scheme per timestep is somewhat higher 
than that of the Aarseth scheme, since the jerk must be calculated as well 
as the acceleration. However, roughly speaking the Hermite scheme allows 
the timestep larger than that for the Aarseth scheme by almost a factor of 
two, while increase in the calculation cost seems to be less than a factor of 
two. Thus, by switching from the fourth-order Aarseth scheme to the fourth 
order-Hermite scheme, effective gain in calculation speed is achieved while the 
calculation program becomes simpler. This combined effect is the reason why 
the fourth-order Hermite scheme is now widely used. 



The result shown in iMakinol (11991al ) implies that for blockstep algorithms 
higher-order schemes might be more efficient. In this paper, we construct 
higher-order generalization of the fourth-order Hermite scheme and report 
their performance. 



There are two different ways to construct higher-order generalization of the 
Hermite scheme. The first one is to use previous timesteps, in the same way a s 
in the original Aarseth scheme. This method was described in IMakinol (11991al ). 
The other is to use even higher derivatives directly calculated, while still using 
only two points in time. Of course, it is possible to combine these two methods. 



To our knowledge, there have been no published work on the latter approach 
combined with the individual timestep algorithm. At first sight, it looks non- 
trivial to combine the direct calculation of the higher-order derivatives and 
individual timestep algorithm. In section 2, we show that the combination is 
actually possible and that it is not much difficult compared to the original 
fourth-order Hermite scheme. In section 3, we present the result of numerical 
experiments, and section 4 is for discussions. 
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2 Sixth- and Eighth-Order Hermite scheme 



2. 1 Basic structure of individual timestep scheme 



In the individual timestep scheme, particle % has its own time (ti), timestep 
(Aij), position (xi) and velocity (vi) at time ij, and acceleration (a,) and time 
derivative(s) of acceleration (dj,a.j,...) calculated at time U. The integration 
proceeds according to the following steps: 

(1) Select particle % with a minimum t L + Atj. Set the global time it) to be 
this minimum, ti + Atj. 

(2) Predict the positions and necessary time derivatives of all particles at 
time t using the predictor polynomials. 

(3) Calculate the acceleration and its time derivative(s) for particle i at time 
t, using the predicted positions etc. 

(4) Construct higher order time derivatives using the Hermite interpolation 
based on the new values of acceleration and its derivatives at time U + Ati 
and those at the previous time U. Apply the corrector to position and 
velocity using these high-order time derivatives, determine new timestep 
Aij, and update time tj. 

(5) Go back to step (1). 

The above description is for the original individual timestep algorithm, and 
we usually use the so-called blockstep algorithms, in which the timesteps are 
quantized to powers of two so that particles of the same stepsize share exactly 
the same time (McMillan 1986). In this way, we can calculate forces on these 
particles in parallel. 

In the following, we present the force calculation formula, predictor, corrector, 
timestep criterion and initialization procedure, in this order. 



2.2 Direct calculation of higher order derivatives 



The gravitational force from particle j to particle i and its first three time 
derivatives are expressed as 
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Table 1 

Number of floating-point operations in force calculation up to potential, accelera- 
tion, jerk, snap and crackle. 



Max derivative 




Total operations 


Operation count 


Potential 


7 add/sub, 4 mul, 1 div, 1 sqrt 


31 


Acceleration 


10 


add/sub, 8 mul, 1 div, 1 sqrt 


38 


Jerk 


21 


add/sub, 19 mul, 1 div, 1 sqrt 


60 


Snap 


39 


add/sub, 38 mul, 1 div, 1 sqrt 


97 


Crackle 


61 


add/sub, 63 mul, 1 div, 1 sqrt 


144 



(1) 

ZaAij, (2) 
6aJij - 3f3Aij, (3) 

9aSij - 9(3 J ij - 37 A^. (4) 

Here, we call the first four time derivatives of the acceleration jerk, snap, 
crackle and pop, and a, (3 and 7 are given by 



Aij — rrij 3 , 
ij 

J - ^ 

J ij — rrij 

ij 

Sij = rrij -J- ■ 

ij 

s~ 1 3 jj 

L ij ~ 171 j Z3~ ' 

ij 



a - 



J: 



V ij ' V ij 



10 

I v ij I + Vjj ■ a>ij 2 

ry I Lc 5 



3v ir a t ,+ r trJlJ + ^ _ 4 ^ 2) 



(5) 
(6) 
(7) 



where r i} Vi, j i and rrii are the position, velocity, total acceleration, total 



jerk and mass of particle i, a nd 
3ij = jj - Ji d Aarseth j, [iooj . 



Vi 



Vi, 0"ij 



a, — a } - and 



Table [T] shows the number of floating point operations needed to calculate 
the terms from potential to crackle. We used the conversion factor 20 for the 
operation counts for one d ivision and one squar e root operations, following the 
convention introduced by IWarren et al. I (119971 ). Compared to the calculation 
up to jerk, the increase of the operation count for higher order terms is rather 
modest. Even the calculation up to crackle is only about a factor of two more 
expensive than that up to jerk. Thus, if the eighth-order scheme allows two 
times larger timestep than that for fourth order scheme for the same accuracy, 
the eighth-order scheme is more efficient. Of course, the CPU time is not 
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directly proportional to the number of floating point operations, and therefore 
the actual efficiency might be somewhat different. 



2. 3 The necessary orders of predictor and corrector 

In the case of fourth-order Hermite scheme, we used two points in time and 
acceleration a and jerk j. To construct a sixth-order scheme, we need to add 
one more term, snap s, to the corrector. With two evaluations, at the beginning 
and at the end of the timestep, of the three variables a, j and s, we indeed 
have the six values needed for a sixth-order scheme. 

One practical question is how to construct the predictor. In the case of the 
fourth-order scheme, it is sufficient to use the terms up to jerk, since the lead- 
ing error of the predicted position then becomes 0(At 4 ), which is consistent 
with the order of the integrator (Aarseth 1963). In the case of the sixth-order 
scheme, we need the predictor with terms up to crackle (third derivative of 
the acceleration), to be consistent. On the other hand, the corrector requires 
terms only up to snap. Therefore, we directly calculate the derivatives only 
up to snap, and evaluate the crackle using Hermite interpolation. The inter- 
polation formula for crackle, as well as those for fifth and sixth-order terms, 
are given in Appendix IA.1I Those for the eighth-order scheme are given in 
Appendix IA.21 



2-4 Predictor 

The predictor for the sixth-order integrator is given by 

111 1 

r = r , + ViAt + -aiAt 2 + -7,-At 3 H s { At 4 H c^At 5 , (8) 

2 6 24 120 

v hP = Vi + a,At + ^jAt 2 + ^At 3 + ^At 4 , (9) 
a itP = di+ jtAt + ^s,At 2 + ^c,At 3 . (10) 

Note that we need to predict acceleration, since it is used to calculate snap 
(see equation [3]). For eighth-order scheme, we need to include two additional 
terms for each predictor, and we also need to predict jerk. Since the predictor 
is simply a Taylor expansion, we do not give the specific forms for the eighth- 
order scheme here. 
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2. 5 Corrector 



The sixth-order corrector is given by 



At At 2 At 3 

Vi, c = v ifl + — (a iA + a ifi ) - -j^-Ui,i ~ 3i,o) + T^Ki + s '-o)> ( n ) 

At At 2 At 3 

r i)C = r<, + -y Kc + »<,o) - -jj Ki - ai,o) + + ii,o)- ( 12 ) 

See Appendix IA. II for the details of derivation. Note that we gave the simplest 
form for the corrector of the position, which uses jerks but not snaps. It is 
possible to construct the corrector which use the snaps, but that would not 
change the order of the time integration. For special problems such as integra- 
tion of near-Kepler orbit with constant timestep, appropri ate treatment of this 



highe st-order term improves the behavior of the integrator (IKokubo fc Makino 



20041 ). 



The eighth-order corrector is given by 



At, . 3At 2 .. . . 

Vi.c = v ifl + — (o»,i + a ij0 ) - -jg-W ~ 3i,o) 

At 3 At 4 
+ ~8A ( Si>1 + 8i '°) ~ 1680 ~ ' ^ 

At. . 3 At 2 . . 

r itC = r ifi + —{v itC + Vifi) - «i.o) 

Z zo 

At 3 At 4 
+ + 3i,o) ~ Y^fj( s i.i _ Si >°)- ( 14 ) 

See IA.2I for the details of the derivation. 



2.6 Timestep criterion 



For a high-order integration scheme with adaptive timestep to work properly, 
it is essential to use an appropriate timestep criterion. In this paper we con- 
sider two different timestep criteria. The first one is the generalization of the 
"Aarseth" criterion 



At 



\ 



n 





a 


a( 2 ) 


+ 


a«| 2 




||cxC3) 


+ 


a( 2 )| 2 



(15) 



where is the fcth derivative of acceleration and r] is a parameter which 
controls the accuracy. Usually, a value around 0.02 is used for 77. This criterion 
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is known to work well with fourth-order schemes, but it is also known that it 
works well only with fourth-order sch e mes and does not give good results for 



higher-order schemes (iMakinol . Il991al ) . lAarsethl (120031 ) notes that this criterion 



should be generalized to include the highest derivative available in higher-order 
integrators. 

We tried to generalize the above criterion to higher-orders as 



A ii) 



l/(p-3) 



Ai = »(^3>J (16) 
where 

A {k) = yV^lla^l + l« (fc) l 2 - (17) 
Here, p is the order of the integrator. We moved the accuracy parameter 77 
out of the fractional power, so that the timestep is directly proportional to 77. 
The numerator is the same as that for the Aarseth criterion for the fourth- 
order scheme, and for the denominators we used the terms of highest orders 
available. The fractional power is chosen to give correct dimension of time. 
This criterion should behave reasonably well, since it does reflect the high- 
order terms. 

We also tested a criterion which is based on the error of the predictor 

i/p 



e a 



ol (I 



I a — a p \ 



where a p is the predicted acceleration and a is calculated one. In order to use 
this criterion the acceleration needs to be predicted to the highest order (see 
section [2731 . 



2. 7 Initialization 



One practical advantage of the fourth-order Hermite scheme is that it is effec- 
tively a single-step algorithm and therefore does not need any special initial- 
ization procedure, except that the initial timestep must be chosen differently. 
Unfortunately, this single-step nature is lost when we go to higher orders, since 
higher derivatives for the predictor need to be constructed using a Hermite 
interpolation. 

The simplest implementation of the initialization procedure is just to use 
lower-order predictors for the first timestep and use appropriately small timestep 
In our current implementation, for the startup of the sixth-order integrator we 
use the terms up to crackle directly calculated and the Aarseth criterion (|T5|) . 
as was done in the Aarseth code. Thus, the order of the predictor is consistent. 
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For the eighth-order integrator, we omit the calculation of further derivatives 
and start with ([15"]) with a small accuracy parameter. 



3 Numerical test 

In this section, we report the behavior of the sixth- and eighth-order Hermite 
schemes, for time integration of a 1024-body Plummer model. We used the 
standard units where the total mass of the system and gravitational constants 
are both unity and the total energy of the system is —1/4. For all calculations, 
we used a softened gravitational potential with e = 4/N = 1/256. We used 
the block timestep algorithm timesteps are restricted to be powers of two, and 
we set an upper limit of timestep as 1/16. All calculations were performed in 
IEEE 754 double precision. 

3. 1 Result for short-time integration 

Figure [T] shows the relation between the maximum relative energy error during 
the integration for 10 time unit and the average number of timesteps per 
particle per unit time. To suppress possible effects of the start-up procedure, 
we first integrate the system for 1/8 unit time and measured the maximum 
relative deviation of the energy during the next 10 time units. 

We can clearly see that the error of sixth- and eighth-order schemes are pro- 
portional to At 6 and At 8 , as expected. For the relative accuracy of 10 -8 , 
the sixth-order scheme allows the average timestep which is almost a factor 
of three larger than that necessary for the fourth-order scheme. For the rel- 
ative accuracy of 10 -11 , the eights-order scheme allows the average timestep 
which is about a factor of seven larger than that necessary for the fourth-order 
scheme. Even for the relatively low accuracy of 10~ 6 , the sixth-order scheme 
allows about a factor of two larger timestep than the fourth-order scheme does. 

Among the three schemes (different predictor orders and different timestep 
criteria), the difference in the achieved accuracy is not very large, but there 
are some trends. For example, when we compare the results with low-order 
predictors and that with high-order predictors, low-order predictors give sys- 
tematically better results, at least for the fourth order integrators. One pos- 
sible reason is that the high-order predictor is effectively less time-symmetric 
compared to low-order predictor, simply because it uses the information from 
the previous timestep. When we compare two timestep criteria, formula ({TBI) 
seems to be worse, at least for large stepsizes. Thus, among these three imple- 
mentations, the combination of low order predictor and generalized Aarseth 



9 




100 1000 
average number of steps per unit time per particle 



Fig. 1. Maximum relative deviation of the total energy during the time integration 
for 10 time units, as a function of average number of timesteps per particle per 
unit time. Triangles, squares and pentagons represent the results of 4th-, 6th- and 
8th-order schemes. Open, filled and star-shape symbols indicate the lowest order 
predictor with generalized Aarseth criterion (|16p . full high-order predictors with 
generalized Aarseth criterion, and full high-order predictor with timestep criterion 
based on the predictor error (fT8j) . The three dotted lines indicate the expected 
scaling relations for 4th-, 6th- and 8th-order algorithms (top to bottom). 

criterion seems to be the most safe. It also requires the least amount of floating 
point operations per timestep. 



3.2 Long-term integration 

We integrated the system until the core collapse occurs. Since we used a soft- 
ened potential, a compact core of the size comparable to the softening length 
is formed after the core collapse. We stopped the calculation at that point. 
The accuracy parameter was chosen so that the actual CPU time per unit time 
integration is initially similar. The actual value of the accuracy parameter for 
the criterion ffl~6l) is 0.1, 0.4, and 0.75 for 4, 6 and 8th order schemes. 

Notice that the system has a chaotic nature, and even if we start from slightly 
different two models, the positions of each particle would be completely dif- 
ferent after a long-term integration. 
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Fig. 2. Time evolution of the central density p c for long-term integrations, for three 
different integration orders and two different values of the random seed for the initial 
model. The curves are shifted by factors of 100. 

Figure [2] shows the evolution of the central density p c . The overall behavior is 
similar for all runs. Figure [3] shows the cumulative relative energy error and 
relative energy error per unit time. As expected, higher-order schemes achieve 
higher accuracy, though the number of timestep per unit time is smaller. With 
all schemes, the error per unit time becomes larger as the system evolves. 
However, this increase is smaller for higher-order schemes. If we compare the 
error per unit time at the beginning and the end of the calculation, with 
fourth-order scheme the error at the end is bigger by nearly three orders of 
magnitude. In the case of the eighth-order scheme, the increase is only around 
a factor of 10. In the case of the sixth-order scheme, the increase is in between 
those for fourth- and eighth-order schemes. 

This result seems to suggest that the higher-order schemes are actually more 
robust than the fourth-order scheme. However, it is not clear from where 
such a difference comes from. Naively, the increase of the integration error is 
understood as coming from particles in the core, which need to be integrated 
for a large number of orbits since the orbital timescale is short. Since the 
structure of the system is not much different, orbital timescales of particles in 
the core should not depend on the integration schemes used, and there is no 
reason for the errors to behave differently. 
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Fig. 3. Total energy error (\E(t) — _E(0)|/|2£(0)|, dashed curve) and energy change 
in one unit time (\(E(t) — E(t — 1)|/|£'(0)|, solid curve) in 4th, 6th and 8th order 
integrators. 



Figure H] shows the average number of timesteps per particles per unit time 
< n steps > (top panel) and the average number of particles integrated in one 
blockstep < rib > (bottom panel). We can see that the increase in the number 
of timesteps is the largest for the eighth-order scheme, and this increase in 
the number of timesteps seems to be the reason for the small error in the 
later time shown in figure [3j Another notable behavior of the higher-order 
schemes is that the average number of particles integrated in one blockstep 
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Fig. 4. The average number of steps per particle per unit time < n steps > (top) 
and the average number of particles advanced in one block-step < > (bottom). 

is larger in higher-order schemes. This means that the timestep criteria for 
Hermite schemes with different orders respond differently to the change of 
the structure of the system. In the following, we examine the reason of this 
behavior of the timesteps. 



In order to examine the behavior of the timestep criterion for different orders, 
it is necessary to calculate the timesteps with different orders for identical dis- 
tribution of particles. We can of course do this for initial condition, but exact 
comparison is impossible for the later times. In order to make the comparison, 
we used the system integrated with the eighth-order scheme, and calculated 
the timesteps with fourth- and sixth-order criteria using the derivatives ob- 
tained by the eighth-order Hermite interpolation. To determine the stepsize, 
we used the accuracy parameter r\ same as that used in the actual time inte- 
gration. 

Figure [5] shows the distribution of timesteps for initial Plummer model and af- 
ter the core collapse (T = 700). We can see that the distribution of timesteps 
depends strongly to the order of the integrator. Even for the initial condi- 
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Fig. 5. Cumulative distribution of timestep for the initial Plummer model (left) 
and that after the core collapse (right). Solid, dashed and dotted curves shows the 
timesteps calculated with fourth-, sixth-, and eighth-order timestep criteria. 
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Fig. 6. Relations of timesteps calculated using the timestep criteria for three inte- 
grations schemes with orders, for the initial Plummer model (left) and after the core 
collapse (right). Top-left, bottom-left and bottom-right panels shows the relations 
between schemes with orders (4-6), (4-8) and (6-8), respectively. 

tion, the range of timesteps of particles for the eighth-order scheme is much 
narrower than that for the fourth-order scheme. This tendency is much more 
pronounced for the system after the core collapse. Thus, for particles with 
smallest timesteps, timestep criteria with different orders give similar values. 
However, particles with large timesteps have very different stepsizes, depend- 
ing on the order of the timestep criterion. 



10"° 10"" 10"' 10" 
At 6th 



Figure M shows actual values of timesteps calculated using timestep criteria of 
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different orders as two-dimensional scattergrams. We can see the correlation 
is quite tight, but at the same time highly nonlinear. While particles with 
small stepsize (those in the central region of the system) have similar stepsizes 
when calculated with different orders, particles with large stepsizes (those in 
the outer region of the system) have very different stepsize depending on the 
order. In particular, it seems high-order schemes have maximum stepsizes 
which depend on the structure of the system. The initial Plummer model 
allows timesteps up to around 0.1, while for the collapsed system timestep 
cannot significantly exceed 0.01 in the case of the eighth-order scheme. In 
other words, the timestep of particles far away from the central region of 
the system somehow shrinks by a factor of 10 in the case of the eighth-order 
criterion, while no such shrinking is visible for the fourth-order criterion. 



3.3 Toy model 



In order to understand this behavior, let us consider a simplified model of the 
collapsed system. Assume that the distribution of particles outside the core 
is isothermal with p oc r -2 , and the core size is r c . Total mass of the system 
within radius r = 1 is 1, and orbital timescale of particles at radius r is r. The 
mass inside radius r is r for r c < r < 1. The number of particles in r < 1 is N 
and mass of particles m is 1/N. In these units, The core mass m c is around 
r c , within a factor of two or so. 

First, we consider the contribution of the nearest neighbor, for a particle at 
r — 1. The distance to the nearest neighbor is roughly r n = 1/N 1 ^ 3 . The 
timescale in which this distance changes is also (r n /v) = 1/iV 1 / 3 , where v is 
the typical relative velocity and is order unity. Therefore, the acceleration and 
its fc-th derivative have the strength of around 

l4 fc) l ~ mr- 2 (r n /v)- k ~ N^/ 3 . (19) 

We can see that the timestep criteria of the form (fT5T) would give similar 
stepsize of l/iV 1//3 for different orders, if the higher order terms are dominated 
by the contributions from near neighbors. To put it in words: if you use a local 
criterion to determine an integration step time scale, you will always get the 
time needed to reach your nearest neighbor. 

Now consider the contribution from one particle at small distance r << 1 
from the center, to a particle at distance one from the center. The strength of 
the acceleration is of the order m = 1/N. The orbital timescale is the larger 
of r and r c . We call this value, the larger of r and r c , as r*. The timescale 
of the change of the acceleration At is the orbital timescale r*, and the frac- 
tional change in the acceleration is also order r* : Aa ~ mr*. Thus the time 
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derivatives of the acceleration have the strength of 

|a?>|~Aa/(At) fc ~mr,.r- fc ~-^ (k > 0). (20) 



If r* < l/iV 1//3 , the contribution to the high order derivative of a particle 
at distance one from the center of the system is larger for a particle with 
distance r < r* from the center than for the nearest neighbor, for sufficiently 
large values of k. For our numerical experiment, the size of the core at the 
core collapse is around 0.003, which is much smaller than 1/iV 1 / 3 . Thus, for 
high enough orders like k — 7, it seems natural that the contributions from 
the particles in the core dominate. For k = 3, the nearest neighbor might still 
be dominant. 

From the viewpoint of the accuracy of the time integration, it seems obvious 
that the behavior of higher order criteria is better. With low-order criteria, 
we effectively ignore the high-frequency variation of the acceleration when 
integrating the orbits of particles far away from the core. As a result, there 
must be the sampling error or aliasing error for the forces from particles in the 
core, which should show up as the integration error. The energy error of one 
particle, due to one particle in the core, in one timestep would be of the order 
of AE ~ mvAaAt, since the distance one particle moves in one timestep is 
mvAt and the force error applied during the time the particle moves is Aa. 
Thus, we have AE ~ Atr c N~ 2 . 

The number of error terms is proportional to the number of particles in the 
system N, number of particles in the core r c N, and the average number of 
timesteps per unit time At -1 . Thus, for one time unit, the total error would 
be of the order of 

A£ sampling ~ y/N ■ r c N ■ At~ l Atr c N~ 2 = N^r 2 ^ At 1 ' 2 . (21) 

Here, we assumed that the errors are random and the total error is proportional 
to the square-root of the number of error terms. Either assumption might not 
be correct. For example, error terms of the forces on different particles from 
one particle in the central region are likely to be correlated. Therefore the 
actual error might be larger. 

In the above, we assumed that the error comes only from the sampling of the 
acceleration. However, with Hermite schemes, what we actually integrate is the 
high order derivatives directly calculated. Thus, the actual error is probably 
much bigger than the estimate above. In the case of the fourth-order Hermite 
scheme which uses the first time derivative, the derivative is of the order 1/N 
from equation (1201 . Therefore, the energy error of one particles is more like 
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AE ~ mv\a^\At 2 ~ At 2 N 2 per timestep, and total error is of order of 

A£ samp i ingJcrk ~ iV-V^At 3 / 2 . (22) 

Note that the only way, for the current individual timestep scheme, to make 
this error reasonably small is to shrink the timestep of all particles in the 
system so that the orbits of core particles are resolved. In that sense, the 
behavior of the eighth-order scheme is numerically correct, and that of fourth- 
order scheme is not. This is probably the reason why the error increases for 
the fourth-order scheme. 



4 Discussion 



4-1 Computational aspects 



In this paper, we present sixth- and eighth-order Hermite schemes to be used 
with individual timestep algorithm, and compared their performance with that 
of the fourth-order scheme. We found that these higher-order schemes do offer 
practical advantages. 

Here we speculate on the merit of higher-order schemes, when used with 
special-purpose computers or parallel implementation of individual timestep 
algorithm on massively-parallel computers. 

One advantage of the direct calculation of higher-order derivatives in hard- 
ware is that calculation of higher-order t erms can be implem ent with short 



word lengths. In the case of GRAPE-4 (IMakino et all Il997f ) or GRAPE-6 



(IMakino et all 120031 ). the calculation of acceleration is done with 24-bit man- 
tissa and jerk with 20-bit mantissa. Similarly, it would be okay to calculate 
snap in 16 bits and crackle in 12 bits. Thus, though the number of operation 
becomes large as we calculate higher-order terms, the silicon area needed does 
not increase much. Even in the case of general-purpose programmable com- 
puters (including GPUs), higher-order schemes have extra advantage, since 
the calculation of high-order terms can be done in lower precision, for exam- 
ple single precision. Many of general-purpose CPUs have extra instructions for 
fast single-precision operations, and GPUs are much faster in single precision 
than in double precision. 

Another advantage is that the number of particles integrated in one blockstep 
is larger for higher-order schemes. Thus, we can enjoy more parallelism with 
higher-order schemes, resulting in higher execution efficiency for both general- 
purpose parallel computers and GRAPEs. 
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Thus, the higher-order schemes are better not just in the pure operation count 
but also in hardware efficiency and parallel efficiency. 



4-2 Physical/mathematical aspects 



In section |3T2| we have seen that, for high-order schemes, timesteps of a parti- 
cles in outer region of the system become short to resolve the high-frequency 
variation of the acceleration due to particles in the central region. In the 
case of low-order schemes, such shrinking did not occur. However, as a result 
the energy error became very large. Thus, it seems that we need to make the 
timestep of all particles small, as the system evolves and the core size becomes 
smaller. This means that the analysis of the calculation cost based on the as- 



sump tion that the nearest neighbor determines the timestep (IMakino &: Hut 



19881 ) is not correct. 



Our conclusion im plies that the individu al timestep algorithm is not so ef- 
fective as shown in lMakino &: Hutl (119881 ). because the maximum timestep of 
particles is larger than that of the shortest timestep by the factor which only 
weakly depends on the number of particles in the system. In principle, how- 
ever, we can reduce the calculation cost by assigning individual timesteps not 
to particles but to interactions. Even when there is a small core with short 
orbital period, the force between two particles both far away from the core 
changes smoothly. Strictly speaking, it is not really smooth, since the position 
and velocity contain the high-frequency terms. However, these high frequency 
terms are much smaller than that of accelerations simply because position and 
velocity are obtained by integrating the acceleration. Thus, if we use relatively 
low-order schemes or the original Adams-type scheme which does not use the 
time derivatives, we might be able to integrate the interaction between two 
particles far away from the core with the timestep much larger than that of 
core particles. We will investigate the possibility of such a scheme in future 
papers. 
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A Notes for implementers 



A.l Interpolation polynomial for the sixth-order integrator 



We consider constructing an interpolation polynomial from acceleration, jerk 
and snap at time t , (a ,j , s ) and those at t 1 , (a±,ji, Si). We define summa- 
tions and differences of them as 



A + = <2i + a , 
A' =ai - a Q , 
J+ = h( 3l + 3o ), 
J~ = h(j! - jo), 
S+ = h 2 ( Sl + s ), 

S- = h 2 ( Sl -s ), (A.l) 

where h — (ti — t )/2 and is the fcth derivative of a. Coefficients of the 
interpolation polynomial at the midpoint t — (to + 1\)/2 are 



a 1/2 = ^-(8A + -5J- + S + ), 
lb 

hj 1/2 = ^(15A~ -7J+ + S-), 
y^i/ 2 = ^(3J--5 + ), 
ja$= 1 -(-5A- + 5J + -S-), 

h 5 (5 ) 1 



120 fll/2 16 

By shifting them to ti, we have derivatives for the next predictor or timestep 
criterion. 



— (3A- -3J+ + S-). (A.2) 



(3) 
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-a {3) 

— tt l/2 


+ ha(j 2 


(4) 
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(4) 

— fl l/2 


+ hafj 2 


(5) 


-a (5) 

— "l/2 




1 





By integrating the even-order terms, we obtain the sixth-order corrector 
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(A.4) 



A. 2 Interpolation polynomial for the eighth-order integrator 



The interpolation polynomial for the eighth-order integrator is constructed 
from (a ,j , s , c ) and (oi, ji, s±, C\). Here c is crackle. Even-order coefficients 
are 



tt l/2 

/i 2 

Y S l/2 
tl (4) 



/l 6 



,(6) 



V 720 "W 
and odd-order ones are 
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(A.6) 



C+ = /,3( Cl + Co); 
C- = h\ Cl -c ). 



(A.7) 



The eighth-order corrector is 



Vl - v = h ( A + - - J' + —S + - —C- 
7 21 105 



(A.8) 
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